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[. INTRODUCTION 


A. GENERAL 

The broad general subject area of ocean currents has been 
the recipient of increasing investigational efforts and mone- 
tary expenditures. Research has encompassed the spectrum of 
ocean-current related subjects from the enormous task of de- 
termining the effects should a whole current system (Gulf 
Stream, for instance) be diverted, to studying the effects 
currents in the ocean have on the growth and decay of den- 
Sity/salinity microstructure in specific localities. A 
sound understanding of all facets of ocean currents will no 
doubt prove to be a large step forward in completing man's 
knowledge of the oceans. One facet of ocean currents which 
has received limited attention in research studies is the 
teaerStical properties of measured ocean-current speeds. 
its siSethemspecitic Subject area covered in this report. 


B. MEASUREMENT OF CURRENT VELOCITIES AND SUBSEQUENT STATIS- 
TICAL STUDIES 


Pie OC tvecasunring.Ocean-Current Velocities - 

Two means have existed for directly determining cur- 
rent peo cies . One-has been to place a stationary or semi- 
Stationary device in the water which recorded the flow speed 
of water around the device. The second way has been to re- 
Miiimenemsetranandriftt of an object placed in the current. 


Nearly all reported investigations in which statistical 
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procedures were used appeared to have based their analyses on 
time-series current-meter data. 
2. Categories of Statistical Studies 

A statistical approach to measured current velocities 
has been the subject or subsection of reported investigation- 
al efforts in Russiay Canada, Prantee, Norway aaa the United 
States. It appears that statistical procedures, as applied 
to ocean currents, can be divided into two categories; those 
dealing in the study of the spectra of ocean currents, and 
those dealing with the distributional aspect of current 
velocities. Many of the studies have been concerned with 
relating spectral properties of ocean currents to internal 
waves, planetary waves, and theories of turbulence. Others, 
tO amgreae extent, igneme the spectral properties of a set 
of data and are concerned with the distributional properties 
of velocity components and speed values. 

Solignenteomeed Statistical Studies 

Russia and the United States have published the 
majority of the reports concerned with the statistical dis- 
tribution and analysis of ocean-current velocities. Webster 
[Ref. 1] described and discussed some elementary operations 
and data presentation techniques for the analysis of a long 
time-series of current-meter observations. Belyayev and 
Ozmidov [Ref. 2], using data measured at a semipermanent 
Bucy "searron, derived@empirical distributions of the current- 
Pevocity components at tén deprims, fr6m 25 to 1200m. It was 


shown that these distributions differed substantially from 
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normal below the pycnocline and that the third and fourth 
moments of the distributions changed abruptly in the 
Ppyenee line. 

Paquette [Refi., 3] concentrated his,efforts on the 
wpeecas Of Current-meter records. He showed that in nearly 
80% of the time-series current-meter records checked, when 
the number of occurrences is plotted against the logarithm 
of the speed to base ten, the typically skewed distribution 
of speed becomes Gaussian at the 0.05 level of significance 
Or greater. On long time-series records, the logarithmic 
standard deviation appeared to range between 0.15 and 0.32. 
He also concluded that part of the distortion often observed 
an the tails of the probability distribution of the data 
wasepresumably due to inherent current-meter errors. 
Paquette's results concerning the distribution of the data 
were presented on cumulative probability plots on which the 
empirical distribution of the data was plotted along with a 
normal distribution. In general the appearance of the em- 
pirical distribution was quite close to normal, and when 
subjected to a Kolmogorov-Smirnov (K-S) test for normality, 
the results suggested this to be true. However, Paquette 
did not analyze the results further to show whether, on the 
average, the feeemieamic speed distribution produced a nearly 
normal curve or some distribution close to normal but with 
systematic deviations from normality. 

Paquette briefly introduced and analyzed a Jamited 


eaigumte of drift-of-ship (DOS) dat#a. The results indicated 
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that this type data compared favorably with the majority of 
the current-meter data. However, since DOS data was not 

extensively analyzed, the results were not firm and no com- 
parison was made between moored current-meter data and DOS 


fata. 


foe PURPOSE 

The purpose of this paper is to investigate more closely 
the normality of the logarithmic-speed distribution as it is 
applied to ocean-current records and to analyze more exten- 
Sively DOS data. It will be shown that DOS data (after a 
necessary alteration) and current-meter data compare quite 
favorably and that the logarithmic-speed distributions of 
both types of data can be considered to be symmetrically 
miemersea about their means with a high level of confidence. 
The mean value of a group of logarithmic-speed LEGA OE OTS 
is shown to have a slight systematic deviation probably due 
to transient phenomena. 

ipeomeer to extend the studies of current-meter time- 
series data to a new area of the ocean and to add more sam- 
ples to the data base, eleven current-meter records from the 
Coastal Upwelling Experiment (CUE) off the coast of Oregon 
were also analyzed. 

The basic approach in this presentation and analysis of 
results has been two-fold. One was to record and plot, at 
normalized deviations from the mean (NDM) of each cumulative 
Pecbability distribution, thé diffferénce valué bétween the 


Piaierthmic distribution and a log-normal distribution. If 
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an empirical logarithmic distribution were truly normal, the 
differences would be zero and a straight line through the 
zero values of the plot would occur. The second procedure 
was to apply known statistical measures and interrelation- 
Ships to parameters derived from the first four moments of 

a distribution. Specifically these parameters are the mean, 
Standard deviation, coefficient of skewness, and coefficient 


of kurtosis. 


Ly 





Ll.) Site DATA 


Data used in the statistical analysis and relationship 


investigations reported in this paper came from four sources. 


ieee LE SERIES DATA 
lw Gaoobaces.ofethemlata 

One source was moored current-meter data recorded by 
Woods Hole Oceanographic Institution [Refs. 4, 5, 6]. The 
second source was moored current-meter data recorded by 
Paquette and designated SCARF 1 through SCARF 7. These first 
two sources include 29 of the 43 time-series records used by 
Paquette [Ref. 3]. 

A third source was eleven sets of moored current- 
meter records furnished by Donald Bishop in the office of 
the Coastal Upwelling Experiment (CUE) at the University of 
Washington. This data was recorded by Oregon State Univer- 
Sity at a rate of one speed record every 5 or 10 minutes. 
TABLE I gives the basic statistical summary of the CUE data. 
In reference to TABLE I, the sample identification number 
provides an indication of the meter's location (these loca- 
tions are shown in Fig. 1). V is the arithmetic mean of 
the speed. Log V gives the mean of the logarithmic-speed 
distribution. o is the arithmetic standard deviation while 
o, stands for the standard deviation of the logarithmic-spced 
distribution. APm is the maximum difference observed between 
the cumulative probability of the logarithmic-speed distri- 
beeton and the cumulative probability of a log-normal 
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distribution over the same speed range. P gives the cumula- 
tive probability of the empirical distribution at the point 
where APM Occurred. 
Zo independence of Observations 

Time-serics data produces questions as to the inde- 
Bendence between consecutive dataspoints since most statis- 
tical procedures are based on the independency of individual 
data samples. Time-series data recorded at short intervals 
are usually autocorrelated which lowers the degree of inde- 
pendence between data observations. The CUE data apparently 
are highly autocorrelated. The autocorrelation coefficient 
drops to 0.3 when using one observation every four hours. 
However, the effects of decimation of the data were not in- 
vestigated. Paquette [Ref. 3] assumed that the number of 
eitectivesindividualsdataspoints (néeéededufor coodness-or1- 
Plt tests) in the distributvens he used could bevobtaimed 
by dividing the total number of observations by the number 
Ot elaeseto get to an autocorrelation coéfficient of 0552 aie 
mowedmrnat deCimination tO tis @euree had negligible ver 
fect on the mean and standard deviation of the distribution. 
It will be assumed that the same procedure can be followed 


Waeneeene CUE data... 


Peewee Or-ouir DATA 
J. ‘Source of the Data 
The fourth source of data, and one not extensively 
memeized by Paqt@ttc, came from the files of the National 


Saemmorraphic Data Center (NODC) from their File H1-9. This 
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file is an extensive sect of comparisons of dead-reckoning 
positions and corresponding fixes covering the period 1904- 
1945. The difference between the dead-reckoning and celes- 
ieeat Or electronic fix is ascribed to a current which is 
presumed constant over the hours and the many tens of miles 
between fixes. NODC furnished computer-gencrated printouts 
which included all information for Marsden Squares (MS) 
mi, 115, 116, 149, 150 and 151. Their locations are shown 
in Fig. 2. Selected data from these printouts were used in 
this analysis. Also shown in Fig. 2 were the basic loca- 
tions of the Woods Hole current meters whose data were used 
both by Paquette and in this thesis. The DOS data was re- 
ported by five-degree quadrants within each ten-degree 
Marsden Square, Fig. 3, and then by month, general current 
direction, and speed interval within each quadrant, Fig. 4. 
7 eeecependence Of Observatrons 

DOS data has been computed and reported by an un- 
countable number of people. The time of day the reports 
were made, the types of ships involved, the location of each 
ship, the wind and weather conditions were all unknown fac- 
tors which were assumed to have encompassed all possibilities 
over the 41 year reporting period. It is known that DOS data 
was not recorded when the reported wind speed excceded 
Beaufort 7 or seas exceeded 3.3m. With all these factors in 
mind, it was assumed the DOS data represented basically ran- 
dom independent samples in the areas from which information 


was reported. This did not exclude the possibility that 


20 





peculiarities in the speed classes may exist for various 
reasons such as errors in grouping the data, bias factors 
mectne part of the reporting navigators, or the kind of 


Space and time averaging involved. 
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ITI. COMPUTER PROGRAMS 


All the statistical parameters generated from the data 
used in this paper were obtained using computer programs on 
the Naval Postgraduate School IBM 360/67 digital computer. 
Table II provides a summary of the major programs utilized. 
Minor programs were written by the’ author to perform specific 
tasks throughout the course of the investigation but these 
did not compute statistical parameters. 

ProrpramnistG classifies cunment-speed data into class 
intervals and plots the resulting histogram on the line 
printer. This program was used to generate statistics on 
the OSU current-meter data, which was received on tape as 
mreavidual records, and on data sets keypunched on to com- 
puter cards. CUDIS MODS and CURST2 accept data in histogram 
form grouped both in even and uneven intervals. CUDIS MOD3 
computes statistical information based on the assumption 
that the number of counts in each speed-class interval are 
Goncentrated at the center of the interval, and produces a 
cumulative log-normal distribution and plots it on a prob- 
ability-paper scale. CURST2 computes statistical information 
based on the assumption that the number of counts in each 
speed-class interval are evenly distributed across the width 
of the interval. It does not produce a plot. Besides the 
information provided in Table II, CURST2 computes the third 
and fourth moments of a distribution and the coefficients of 


skewness and kurtosis. These are not generated by CUDIS MOD3. 
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IV. MOORED CURRENT-METER DATA 


A. ANALYSIS APPROACH USED 

Paquette [Ref. 3] concluded that the current-speed dis- 
tributions were log-normal at a level-of-significance of 
0.05 or greater by testing each of the 43 series studied with 
the K-S statistic. He used the mean and standard deviation 
obtained from the data as estimates of these parameters for 
the parent population. However the K-S statistic Paquette 
used assumes the parameters of the parent population are not 
peeimated from the data. According to Lilliefors [Ref. 7], 
Mien che parameters of the parent distribution are estimated 
mrom the data, the probability of a type I error will be 
Significantly smaller than as given by tables of the K-S 
peat istic. Lilliefors provides a new table for the critical 
values of the deviation for several useful a values. The 
cues Used to construct Fig, S were obtained from this 
table. The effective number of observations is along the 
abscissa with the maximum permissible deviation values 
plotted on the ordinate. Thus the results obtained by 
Paquette are conservative in that his results are at a higher 
level-of-significance than they should be. 

All current-speed data used by Paquette [Ref. 3] and in 
this thesis were generally is histogram form. It is realized 
mie K=a test was derived for ungrouped data and that its 
behavior is less well understood when using grouped data. 


Current-meter data were grouped in only one cm/sec intervals 
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and appeared more or less as continuous data. However, DOS 
data was highly grouped and less acceptable for application 
of the K-S statistic. Therefore, it was assumed that the 
DOS data would give a larger maximum difference between 
cumulative distributions than ungrouped data (an assumption 
which seemed reasonable), and that the K-S test would give 
a reasonable result that was somewhat liberal (reject more 
than it would if the data were not grouped). More work on 
this subject is needed but is left for future studies. 

omchcteedt the Currenct-snecd distr but TOnom anc mim 
general log-normal, one might assume that the normalized- 
logarithmic distributions derived from the many time series 
ought to be comparable and members of an ensemble of dis- 
tributions. Then one may test the fit by examining the 
mevitations of the cumulative distribution function (€.D.F.) 
of the data from the cumulative log-normal distribution at 
a number of values of the normalized deviation of the log- 
arithmic speed, et Ye where Loo Y 1s the logaratmm 
monte base ten of any ae value V, Log V is the mean of 
the logarithmic-speed distribution, and OF is the standard 
Me-ievoneof this distribution. This technique has the ad- 
Vantage of examining all of the series together, looking for 
an overall Menatic ditrevence from the ideal and looking 
atethee distribution of the differeme@ values at each nor- 
malized deviation point selected. 

Besides the goodness-of-fit to the normal cumulative =dis- 


tribution curve, the coefficients of skewness and kurtosis 
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mere CXamined as they relate to each other on a Pearson dia-= 
gram. These coefficients also might be expected to be com- 
parable if the curves are similar. However, as pointed 
out by Pearson [Ref. 8], different distributions can have 
the same first four moments. These coefficients apparently 


have not been used previously in studies of currents. 


B. DEGREE OF NORMALITY OF TIME-SERIES LOGARI FHMIC-SPEEN IAI 
1. Data Used and Presentation Methods 

The data used in this approach was part of the same 
Eime-series data used by Paquette and included alletrie soe hE 
data and all the Webster and Fofonoff data, 29 time-series 
data sets in all. Figure 6 is a ploteot the Gitvernenecs ate 
(observed minus predicted logarithmic cumulative probabilities) 
for the 29 time-series data sets at nine normalized deviations 
from the mean (NDM). Difference values are noted along the 
absicssa while the nine NDM values selected are indicated 
along the ordinate. Plus and minus three sigma units were 
Wcedeas the limits of the NDM values because the Ltimesscemaes 
records did not provide sufficient values for analysis be- 
yond these points. Bar plots of the difference values at 
each NDM are given showing the range of values observed. 
A smooth curve was faired through the mean value at each 
NDM considered. 

Table III provides summary statistics of the data 
lgteaeta construct Fig. 6. Not all of the 29 time serves 
data sets btonded out to the two and three sigma location. 


The last column of this table provides the results of a K-S 
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goodness-of-fit test for normality of the difference values 
at each NDM assuming a mean value of zero. Under the hypoth- 
eses that the log-speed transformation produces a normal 
distribution from current-speed records, it is assumed that 
difference values at each NDM are random and come from a 
nearly normal population whose mean is zero. 

It -1seireadilysapparent inekages6 that Chetmange: of 
difference values includes the zero value in all instances. 
However, the distributions of the difference values are not 
mmeeeneral symmetric about zero. This 2s not too surprising 
Since any subsample drawn from a parent population will most 
likely not possess the same mean as the parent population. 

A smooth curve through the mean values at each NDM shows a 
Pyotcematic "S'' shape variation™from the log-normal curve. 
a ee cance of Observed Resullrs 

In order to determine the significance of the "S" 
shape variation in Fig. 6, one must examine some of the sta- 
mwieercal values provided in Table 111. To aidean this exam- 
ination, Table IV is given which shows some of the computations 
and values required in the following analysis. Columns l, 2, 
oeeo and 7 of Table IV are repeated from Table III. Brooks 
and Carruthers [Ref. 9] provide computations for the standard 
eamrer of the worst kenaaae of skewness of any series of N ran- 
dom numbers (p. 55), and the standard error of a single ob- 
servation from a sample of N obsérvations (p. 40). "t" in 
Table IV is the value for a "Student" t-dPetribifrion, Vv 1s 


the degrees-of-freedom for that distribution, and a is the 
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Significance levels obtained when entering "Student" t-tables 
for a two-tailed level-of-significance test. The usage of 
these values will be explained later. 
a. symmetry of the Data at Each NDM 

It is noted in Table IV that at all but one of 
the nine NDM's, the coefficients of skewness of the individual 
sets of difference values is less than one. Brooks and 
Carruthers [Ref. 9] point out that any set of N random numbers 
will show a certain amount of skewness (p.55), however, the 
absolute value of the coefficient of skewness less than one 
indicates data only moderately skewed (p. 56). They also 
Specify that the skewness can be considered real only when 
the coefficient of skewness exceeds twice the value of the 
Beendard error (p. 55). It seems only He@wcal that cne 
Mercer the value of N, the better the confidence in these 
Statements. By comparison of columns three and five in 
Table IV, it is seen that except for the NDM value of three 
Sioniaememmajority of the coefficients of skewness fall 
Siensticantly short of being equal to twice the value of the 
Stendard error. Since skewness is an indication of the sym- 
metry of a distribution about its mean, the indication from 
ieanwe: IV 1s that at each NDM except three sigma, the dit- 
ference values are basically symmetrical about their means. 

Since the data at the NDM values are basically 
symmetrical about their means and since a review of the 
fnistograms of the data show in genéral normal type distri- 


Pievons, except at three sigma where the data @xhibits a 


| 
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eeoinite "J" shaped distribution to the left, a test was 
made to determine to what degree the data were normal about 
@me Nypothesized mean of zero. A K-S test was conducted 
with the standard deviation estimated from the data. The 
mawbcS are given in the last column of Table Iii. The 
figures given are the level of significance or ao values of 
the test obtained from Fig. 5. The amount of reduction in 
the a value due to estimating only the standard deviation 
from the data was not known, but it was assumed to be sig- 
Mericant and therefore Lilliefor's results were used. It 
appears the normal hypothesis could be rejected on the basis 
of the evidence from these data, at a significance level of 
.008 or below. 

b. Significance of the Deviations of the Means 

As stated previously, it is a known fact that 
any subsample from a large population will most likely not 
have the same mean as the parent population. Brooks and 
Sarruthers [Ref. 9, p. 65] demonstrate a method of testing 
whether a mean M from a subsample differs significantly 
from a postulated population mean M'. The test can be made 
using the well-known "Student" t-distribution where the 
t-value is computed by: 
_ (M -gT) 
o/ YN 

o is the estimate of the population standard deviation 
derived from the sample, and the distribution of ''t'' 1s as- 


sociated with N-1l degrees-of-freedom. This #@@et vean be used 
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DOwmest “Ghe Signifacance of the deviation of the mean from 
zero at the individual NDM's. At each of the NDM's the 

value of M' is zero, and the values for the other computa- 
Peonioeto derive 't'' are given in Table IV. The test hypoth- 
esis is that the subsample mean does not differ significantly 
iPaomeeero. 

Figure 7 is derived from the t-tables and can be 
used for the t-tests in this thesis. The "Student" t-value 
1s given along the abscissa with level of significance on 
the ordinate. The curves are for different values of 
degrees-of-freedom. Enter with the t-value and degrees-of- 
Poceon anmd@eread off a on the ordinate. 

We can infer therefore from the results in Table 
ieeriat the departure from the mean of zero at .cach NDM value 
is probably significant except possibly at NDM values of -2 
and 0.5. These latter two means are near zero anyway and 
are near crossing POINntSmrn vieweunrve . Ueliterefores=ene "S” 
shaped curve in Fig. 6 is indeed most probably real, and not 
a sampling artifact. 


c. An Engineering Viewpoint of the Significance of 
Results 


Perhaps more important than significance in 
terms of probability -is the utility of this information from 
an engineering viewpoint. If one is concerned with the maxi- 
num current to be expected on an object being placed in the 
Seemmeeie 1s concerned with the high speed tails of the dis- 
tribution being correct. At a NDM of two sigma the normal 


probability ought to be 0.9772. The maximum difference value 
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observed from the data was 0.023. This gives an error, 

(3, that 1S not quite as great as one times the 

residual probability remaining. At the NDM of three sigma 

this error increases to over sixteen times the residual 

probability remaining. Therefore, the data at the three 

Sigma value is unreliable for use, as will be shown below. 
d. General Summary and Possible Errors 

PUNGcchmebeesaidathat the Ss Shape CurVeming Fie 226 
is most probably real as shown and not due to chance. In 
general the type of curve represented by the smooth curve in 
Fig. 6 is one which contains slightly fewer data points be- 
low the mean and slightly more data points above the mean 
than would be expected of normally distributed data. To 
Pe@twetve O Into .aebctter perspective to indicate just how 
much deviation is being shown, a more familiar representa- 
tion of the CDF's of the curves in question is shown in Fig. 
8. As can be seen, the maximum deviations are small and the 
CDF's of the two distributions are almost identical. 

Perhaps one could argue that the systematic 
deviation in Fig. 6 could be caused by measurement errors or 
errors in treating the data. This could possibly be true if 
mmmrere not forthe fact that the data used for Fig. 6 came 
from two separate sources and that a similar plot using only 
fimeneleven sets of CUE data (which is yet<a third source and 
one which used a different type of current meter, the 
Aanderaa, than the SCARF and Webster and Fofonoff data) 


showed the same general variation. Therefore the reason for 
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this systematic variation is not clear. Some possible causes 
could be the occurrence of events such as storms which may 
produce anomalous water velocities for a substantial frac- 
tem Olethe recording period, moomamg transits, and excessive 
Caciikataoneot the buoy during the recording period. All of 
Enesies could conceivably produce the type of effect noticed. 
e. Limit of Usefulness 

It was shown that at all NDM's, the difference 
values obtained were basically symmetrical about their mean 
and the histograms indicated possibilities of normality except 
at the three sigma location. The distribution here was "J" 
shaped trailing off to the left or towards higher negative 
difference values. This says that at a NDM of three there 
1S in general fewer observations than observed in a normai 
Sunveworstne data. This 1s to be expected sinee the @wrrent 
meters have a tendency to record fewer than observed higher 
speeds due to a coalescing of speed dots on the recording 
film. Therefore it appears the NDM value of three is beyond 
the usefulness of the current meter to provide satisfactory 
Gata tor analysis. Although current meters do have masiiient 
fee stalling of the rotor at the low-speed end of the scale, 
fre 'data at a NDM value of minus three does not indicate 
any problems, so it is assumed this value is within the use- 
ful ramge of the current metersgused. Since data obtained 
fomevery hich and very low speeds is suspect, no explicit 
attemt@mon has béénm paid»*to this in the process of statistical 
estimation. New "robust" procedures that account for such 


data difficulties are described by Andrews [Ref. 10]. 
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C. PEARSON DIAGRAM 
1. Presentation Mcthod 

Another means of determining the type of distribution 
data May represent is by use of a Pearson diagram. This is 
a diagram on which is plotted the square of the coefficient 
of skewness, By; versus the coefficient of kurtosis, B.. 
Pearson [Ref. 11] showed that different regions of the By: 
B. Space correspond to several different theoretical distri- 
bution curves. Table V provides the By and B. values for “the 
logarithmic time-series data previously considered plus these 
wees for the CUE data which will now be included for anal-— 
ysis. Figure 9 is a Pearson diagram on which the By, B 4 
values from Table V are plotted. 

Ueoinateadtr1on of Data Errors 

The plotted points in Fig. 9 appear to show an exces- 
Sive spread. However, further investigation into comments 
concerning the recording of the SCARF and Webster and Fofonoff 
data showed that about 63% of the data sets having a B. valuc 
of 4.5 or below experienced marked quantization in speeds, 
mieher than normal spceds due to mooring transits, or exces- 
sive buoy oscillations while in place. About 67% of the dis- 
tributions with values of By equal to 1.0 or greater showed 
@aesc same characteristics. Only one of the CUE data retords 
plotted in the region just discussed but no detailed informa- 
tion on those records was readily available. 

One of the errors mentioned above, high speeds duc 


to mooring transits, does add a quantity of high speed values 
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to a speed record. Transient phenomena such as storms and 
influences from high speed current regimes can also cause 

an excess of high speed current values. These high values 
cause the distribution to be more positively skewed than 
otherwise would be expected. These factors could distort 

a speed record significantly if the total recording time is 
small. Many of the records with large By» B5 values were 
meso snelatively short time duration (less than a day). 
Pearson [Ref. 8, p. 285] discusses this problem of long tails 
on a distribution and shows that the contribution to moments 
from the tails significantly increases as the moment in- 
Beddoes For instance, the contribution to the fourth moment 
imemeareas im the outer .001 part of the Maal, ee a distrapue 
tion with By» B 4 Values Of82,/9 andega0l respective, . 1s 
amout 41%. This contribution increases to 74.2% if the outer 
memndrt ot the tail 1s considered. Since the By and B 
Values depend on the second, third and fourth moments, er- 
roneous speed values which extend the tails of a distribution 
wertemave Significant effect on where a distribution plots 

on a Pearson diagram. 

It would be impossible, without a highly detailed 
pds. to ascertain to what extent the three errors mentioned 
influenced the Aria but is fairly obvious that some had 
Significant influences on the high values of By and B. ob- 
served in Fig. 9. None of these problems were noted in data 
which exhibited Ba» B 4 values smaller than the values given 


above. 
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3. Summary of Results 


A normal curve will generate By and B, values of 0.0 
and 3.0 respectively. A grouping of points about the (0,3) 
value would be an indication the log speed transformation 
was a good fit. Except for the points previously mentioned, 
MiemmayOrity of the logarithmic time-series data plots 
closely grouped about the (0,3) point. Again the indication 
wseunat the log=normal approach to Current-meter Peme-seuges 
data produces a near-normal distribution. This diagram will 
be used later to compare with the DOS data. 

D'Agostino and Pearson [Ref. 12] and Bowman [Ref. 13] 
have published recent articles on the use of the By» B 
Statistics in testing normality of a data set. Ti@ar pro- 
cedures were not used in this thesis, but are referenced for 


future use. 
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Ve DURReor- ol DATA 


The attention of the analysis effort then shifted to the 
DOS data. The number of locations where current meters have 
recorded measurements iS small in comparison to the total 
area of the ocean. However, DOS information is available 
over a large percentage of both the Atlantic and Pacific 
Wecan- lf a suitable distribution for these speeds couldebe 
found, a method would be aoe for estimating the speeds 
probabilistically. This requires also some way of estimating 
the second moment, a quantity which is not charted on the 
current charts. Since DOS data are somewhat different and 
probably more distorted than current-meter data, it is de- 
Smile ero usc CULrent-mMcter datd co neip) correct tne 
q1s tortmons . 

It is recognized that ocean currents usualy decrease with 
Geptiwe 11S 1S an important part of the current prediction 
problem to vengineers. The present study does not enter into 


thas. problem. 


A. IRREGULARITIES IN DOS DATA 

DOS data used in this study appear to suffer some ir- 
regularities at both ends of the spced spectrum. This is 
discussed to some degree by Paquette. The four knot speed 
class (see Fig. 3) includes all accepted speeds four knots 
and greater. This has the @f{fect. of requiring one to place 


aeiieceon thc upper class interval in order to proceéd 
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with distributional investigations. Herein enters one pos- 
Plotiicy 1or error. Although the totalenumber of occurrences 
in this speed class is small in comparison to the total count, 
the generated errors could be significant. A speed of 4.5 
knots was chosen as the top limit for this analysis. 

The lowest class intervalalso»presents a problem. Al- 
though described as "calm,'' its upper boundary is slightly 
less than 0.1 knot. It is assumed that true zeros do not 
exist) and the lower boundary is placed at 0.01 knot. Small 
changes in this arbitrary choice have considerable effect 
when the logarithmic transformation is made. Furthermore, 
after transformation the class interval is too large to 
properly represent the tail of the curve. A pictorially . 
iecr technique would bevwie distribute the counts im this 
interval into several intervals according to a rule consis- 
tent with the log-normal curve. This seemed like too much 
tampering with the data and the above simple course was 
followed. 

Lieto appdanene tharewind Crrecrsincluded in the recorded 
Speeds sis Impossible {fo ascertain. It could add to or re- 
duce from the true current speed. This would vary with wind 
Speed peace tlonmoLesilp thavel relative to the wind, and 
PHO oe simin se NO Wand Correction factors were entered. 
As was mentioned, data taken when the winds were above Force 
(eave sexeludedad trom the data. While this reduces the effect 
of excessive wind-drift of the ship, it also eliminates the 


higher speeds of wind-driven current. 
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It is also to be noted that the DOS data are averages in 
time and space. This averaging will smooth sharp high and 
low peaks and will reduce the apparent numbers, especially 
Cietnie hiiren speeds. 

Human error certainly enters into the results. In most 
Gases one ‘expectsethas to be Gaussaan error and to have little 
effect except to™increase the standard deviation slightly. 
However, there appears to be a significant bias at the low- 
seeeca end of thevcinve which will be discussed in the next 


section. 


B. A NECESSARY DATA ALTERATION 

There is an apparent anomaly in the "Calms" and 0.1 knot 
mee dwecrasses, so lt.is believed that this is an artifact 
arising from a natural but unjustified pride in precision 
of celestial and electronic fixes. There is nearly always 
some scatter among the navigational lines of position. It 
would be natural for the navigator to be biased toward those 
Viiemeieneeds with the dead-reckoning position. So it would 
not be surprising to find more recorded "calms" than actually 
OCeEUr ted. 

Po ecm lOlmmunanecated by ¢.c.d. ft. 

This appeared to be an explanation for the results 
observed when the empirical cumulative distribution function 
(wer Gmemelonetnewboondatd is plotted. The e.c.d.f. is a 
plot of the i-th ordered value as ordinate against (i-%)/N 
as abscissa. N is the total data count. In one-dimensional 


samples, it provides an exhaustive representation of the data 
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under the following broad assumptions: (i) that the order 
of the observations is immaterial, (ii) that there is no 
classification of the observations, based on extraneous con- 
Siderations, which one wishes to employ; and (iii) if the 
Sample is non-random, then appropriate weights are specified. 
Wilk and Gnanadesikan [Ref. 14] discuss the significant ad- 
Maintages Of uSing the ewe.d.f. wn a descriptive test of data. 
Ht 1S pointed out™by them that the e.c.d.f. “is a robust 
carrier of information on location, spread and shape, and an 
effective indicator of peculiarities" (p. 2). Figure 10, 
included as an example of the type of plot one might expect 
Gomace fromevasloo-normal data series exhibiting no readaly 
apparent data irregularities, is the e.c.d.f. for Webster 
and Fofonoff measurement No. 1012 (WF 1012). One sees bas- 
ically a smooth flow of the data from one end to the other. 
Peocted in Figtire 11 1s thesexc.d.£. of MS 115, quadrant 1, 
month 10. The data flow appears smooth in the upper 60% of 
the observations, but some peculiarity is evident in the 
lower end of the data. It was felt two basic reasons caused 
this to occur. One is the lack of resolution of speeds in 
the region near ae However, despite this factor, it ap- 
pears likely the main reason is that too many observations 
occur in the "calm" class. If some were transferred to the 
0.1 knot speed class, the e.c.d.f. plot would appear smoother. 
Plots of the e.c.d.f. of other sets of DOS data showed 
Similar traits to varying degrees. It was not feasible to 


investigate this characteristic of the data more thoroughly 


38 





at this time. Therefore, a partial correction was made by 
ape trarily sitttine nine-tenths of the counts in the "calm" 
interval into the 0.1 knot interval. Figure 12 is the e.c.d.f. 
for the data of Fig. 11 altered in this way. It shows a much 
smoother data fit and one which generally resembles the 
current-meter data of Fig. 10, except for Reeth at 
the lower end which may be obscured by the coarseness of the 
subdivision into intervals. 

eeeeaiteration Indicated by Probability Density Plot 

Visual examination of the log-normal probability- 

paper plots for the cases studied showed this arbitrary 
change to be at least approximately correct. As an example, 
Fig. 13 shows two separate logarithmic probability density 
plots forewsS 116-4-6eand MS 116-3-9. Each include the re- 
sults of one unchanged data base and one corrected as dis- 
cussed above. The effect of the nine-tenths shift is very 
dramatic and produces a more normal appearing probability 
density plot. No further investigation was done to determine 


whether the nine-tenths shift was an optimum alteration. 


ia) DOS soibad FS) ECAR PARAMETERS 
1. Data Used and Presentation Methods 
Fifty different months of DOS data were selected for 
analysis from areas generally off the northeast coast of the 
United States. Each set of data was distributed over a five- 
degree square. The squares and months were selected to pro- 
vide data from within and outside areas of expected high current 


foecrumonrat different times of the year due to major current 
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systems. These fifty data sets were then altered by the 
nine-tenths data shift and then analyzed with the aid of 
the computer programs CUDIS MOD3 and CURST2. 

Table VI provides the statistical summary of the 
data generated by these programs. Columns one and two iden- 
tify the data sets and indicate the number of speed class 
intervals into which the data is divided as well as the total 
speed observations per data set. The arithmetic mean (V) and 
Standard deviation (co) are given in columns three and four 
respectively. Columns five through eight provide the log- 
arithmic statistics for each data set and include in the 
order given, mean (Log V), standard deviation (0,), coeffi- 
cient of skewness and coefficient of kurtosis. As defined 
before, APm is the maximum difference between the logarithmic- 
speed cumulative probability and a log-normal cumulative 
probability, while P is the value of the logarithmic-speed 
cumulative probability where APm occurs. For comparison, 
these values are given for the curve that existed prior to 
the nine-tenths alteration. 

2. Comparison With Current-Meter Data 

Several results become readily apparent from Table 
VI when compared with Paquette's work [Ref. 3] and Table I 
of this report. The logarithmic standard deviation appears 
to be grouped into narrow limits between 0.24 and 0.36. 
This measurement for the current-meter data ranged between 
Ollmee Us555., [his 15 attributed to the grouping of the 


Pee ecaeaeond the limet placed on the high-speed end of the 


40 








DOS data. All the DOS data sets except one show a small to 
moderate negative skewness. As can be seen from Table V 
this was generally true for the current-meter data, however 
some exhibited skewness coefficients that were positive and 
some that were negative but greater than minus one. The 
coefficients of kurtosis for the DOS data ranged between 
Z.49 and 5.558 while for logarithmic current-meter time- 
Series data they ranged in value from 2.46 to 12.91. The 
fiwo sets Of, data look generally alike except the current- 
meter data is considerably more variable. Some deviations 
Mivethe CUrrent-meter results are so extreme that peculiar- 
ities in the data are suggested. 
3. K-S Test of Normality of Each Data Set 

In order to produce a numerical measure of closeness 
of fit of the logarithmic current-meter distributions to the 
Tog-normal, Paquette [Ref. 3] applied the K-S statistic as 
previously mentioned. The K-S statistic uses the maximum 
deviation in absolute value between the empirical and theo- 
retical cumulative distribution (APm in Table VI) and the 
effective number of observations (number of independent ob- 
Servations) to derive a level-of-significance for the fit. 

The K-S test was applied to the DOS data in Table VI 
using Lilliefors' results. The total number of observations 
Dimecteimodtea@eset was UuSed to enter Fig. 5. At ana level 
of 0.05, the maximum permissible deviation was obtained from 
the ordinate. If this value was greater than APm in Table 


VI, the normal hypothesis could not be rejected on the basis 
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of the evidence from this data at the significance level of 
0.05. Ninety percent of the DOS data sets passed the K-S 
test with a confidence level of 0.05 or greater. The same 
procedure was used to test the data sets prior to the nine- 
tenths alteration. Nearly 88% of the unaltered DOS data sets 
Peltedethewk-o test at the 0.05 level of significance. It 
therefore appears that the nine-tenths data alteration in the 
first speed class produces a much more normal logarithmic- 


speed distribution. 


D. DEGREE OF NORMALITY OF DOS LOGARITHMIC-SPEED DATA 
ee ciecdes ie Se miea ton 

The same procedures as used with the current-meter 
data were applied to the DOS data. A bar plot of the dif- 
ference values between the observed and predicted cumulative 
distributions at designated NDM values is presented in Fig. 
14~ The striking resemblamce in shape» to Fig. 6 is readily 
apparent. Table VII provides a summary of the statistics 
Prom themddta used in @enstructing Fig. 14. This table cor- 
Pesponds to Lable LEE. The distrrpution of the difference 
values at the individual NDM's is not entirely symmetric 
about zero, and a curve smoothed through the mean value at 
each NDM shows a slight "S' shaped systematic variation from 
the “normal curve - 

The distribution represented by the smooth curve 
through the mean values of the DOS difference values in Fig. 
14>is mearhy the same as the current meter data except it is 


MOrTre™ symmetrical! in shape than the curve in Fig. 6. 
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Zaeeeoxvniiertry OL the Data at Each NDM and Overall Limits 
a Mata Usefulness 


Table VIII is like Table IV except that the figures 
come from the DOS data under consideration. A review of the 
coefficients of skewness in Table VIII show that except at 
the three sigma location, all values are significantly less 
than unity, indicating only moderate skewness. All but two 
of the coefficients of skewness are significantly less than 
twice the standard error, indicating that their skewness is 
probably not real and the data are nearly symmetric about 
their means. NDM values of minus one and three showed signs 
gered! Skewmess in the distrrpution of difference values. 

A visual survey of the histograms of the difference 
values at each NDM point revealed basically normal looking 
advstributions except at the three-sigma location, where the 
distribution resembled de), sShaped clave trailing Off seo 
the left toward higher negative values. This indicated fewer 
observations were observed in this area than expected. This 
result should be expected since restrictions based on wind 
Ronee eandesecd state at the higher-Speed end of the data 
probably eliminated many. of these higher-current values. A 
Similar phenomenon was observed with the current-meter data. 
D—icernibonestiMeatOwoce that because of the restrictions 
on the high speed ends of the current values both current- 
meter and DOS records showed a similar exponential distribu- 
tion at the three sigma point with nearly identical values 
for the coefficient of skewness and coefficient of kurtosis. 


Unless some means is derived to correct for the lost data in 
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the high speed tail of the DOS distributions such as extend- 
ing the upper limit of the 4.0 knot speed class, the three 
Sigma point, as with current-meters, appears to limit the 
useful range of DOS data. 

A K-S test was conducted at each NDM to test the 
hypothesis that the data at these points were aguTal about 
the theoretical mean of zero. The results are shown in the 
right hand column of Table VII. One sees that there is 
little or no likelihood that the data could be normal about 
zero. This corresponds to the results obtained from current- 
meter data as hown in Table III. 

5. ous ricages of Wevwiuarionsuot uthe.Means 

A “Student” t-test was made to test the hypothesis 
that at each NDM, the deviation of the data mean from the 
theoretical mean of zero is not significant. The computations 
and results of this test are given in Table VIII. Only two 
locations passed this test with a level of significance 
Pieweed than 0,05, — Inese two points, -0O.S sigma and one 
Sigma, are near crossing points of the smooth curve in Fig. 
14 and therefore concurrence with the hypothesis would be 
high at those points. As was found in Fig. 6, the deviations 


Gausing the “S” shape curve in Fig. 14 are most likely real. 


E. PEARSON DIAGRAM USING DOS DATA 


Paptot of the BS: Beevalves tor the fifty logarithmic DOS 


2 
data sets on a Pearson diagram is shown in Fig. 15. No ex- 
ceedingly large values of Bs and 8. were obtained from the 


DOS distributions and no attempt was made to identify any 
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irregularities in the two distributions that had a B, value 
greater than 4.5. It is readily apparent the DOS data is 
closely grouped around the (0,3) point on the diagram and 
compares most favorably to the majority of the current-meter 
distributions in Fig. 9. The Pearson diagram has been used 
aS an indication of normality and as a tool for general com- 
parison between the two types of data included in this thesis. 
The full utilization and subsequent implications one could 
employ with regard to current-speed distributions through 
use of the Pearson diagram are left to future work in this 
area. 
F. GENERAL BAR PLOT COMPARISON BETWEEN CURRENT METER AND DOS 

DATA 

hegimme 156 1S amcomposmite of Fig. G amd Fig. 14 photted 
together for comparison. It is readily apparent that the 
Tor tepirrte, sInedliterence values 15 more extreme for current- 
meter data than for DOS data. The most likely reasons for 
Poo wiomenatethe N05 data are highly grouped, in general 
contain only a moderate number of observations, and those 
observations that are available have been averaged over time 
and space due to the nature Heeenemrecording technique. It 
is possible that the DOS data are a better measure of the 
statistics of aesesits measurements made over very long per- 
iods than are the current-meter data, having gained more from 
their randomized distribution over years of time than they 
have lost from their various known distortions. It would 
Seem, therefore, that the difference in variability has little 
significance. 
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Another apparent discrepancy in Fig. 16 is that it seems 
the smooth curve through the means of the current-meter dif- 
ference values is offset from the DOS curve by about one 
Sigma unit. What probably has happened is that the lower 
tail of the DOS data has been compressed towards the upper 
tail by about one sigma unit. This also accounts for the 
fact that no DOS data sets exhibited low-speed values which 
extended out to three standard deviations from the mean. The 
reason is that the grouping into class intervals centers the 
speed of the lowest class interval higher than the several 
low-speed class intervals in the current-meter distribution. 
These low speeds become relatively large deviants from the 
mean after the logarithmic transformation. It was necessary 
to make an ad hoe readjustment of the DOS data in the iow- 
speed end while at the same time setting a rigid boundary on 
the high-speed end. No such adjustments were necessary for 


the current-meter data. 


G. OTHER POSSIBLE VARIATIONS WITHIN DOS DATA 

SOrewOL the Variability noted im’ the current-speed data 
used in this report could have come from differences in area 
influences on the data and differences in seasonal influences 
on the data. Because the DOS data was available in a large 
quantity covering both area and time, an effort was made to 
check for possible indications of these two types of vari- 
opiieveusing the NOS data only, The procedure was to select 
Micw@dta and plot 1t an the bar format similar to Fig. 6. 


The number of distributions that could be used from the 
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fifty DOS data sets previously studied ranged between eight 
and ten for each case cited below. Because the data base 
was small, the plots generated were used to provide possible 
Meeteattous Of differences without proceeding further with 
Peete redivec tests OF in-depth reasening foretheir existence. 
ieee a Intluence 

Figure 17 is a plot using data from two separate 
ae aoe oma oeandeMo 114-1, to check for possible area in- 
Pivence On Current speeds. Individual months in each area 
were taken as separate data sets. The plot indicates the 
Pirro rrent speeds i MS 149-3 are in general more vari- 
Beem over a year S period than in MS 114-1. This is not 
too surprising since MS 149-3 is east of Newfoundland and 
probably more susceptible to storms and current variations; 
Miemanedmiscel!ocated in the vicinity where the Labrador and 
fife orredm cUrrent regimes generally mix. MS 114-1 1s in 
the mid-Atlantic east of Charleston, South Carolina, where 
active and variable current-speed conditions are not known 
Pomemictes 4oOWever, the general shape of the two curves is 
about the same. Therefore, the indication is that possibly 
an area influence on sureace current speeds exists affecting 
Piemvdnidoility oO: the speeds but not the general shape of 
fei wotlserioueton of the logarithmic-speed curve. 

Pee seasona le int luence 

Figure 18 is a plot using data from two separate 
Be@sOnsoeor the year, winter and summer. For winter the month 
of January was selected and the data randomly covered all Ms 


areas except MSei4s. For summer, the month of July was selected 
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and the data covered the same MS areas and quadrants from 
which the January data was taken. The plot indicates that 
in the summer the currents are in general more variable in 
magnitude than the winter currents, however the general 
Shape of the curves is somewhat similar. The implications 
of the variability noted cannot be readily related to any 
current regimes. Since the factors which create and maintain 
ocean currents are numerous and sometimes unpredictable, an 
in-depth study would be needed to confirm these variability 
Tesults and then to establish reasons for their existence. 
However, there are indications that seasons do influence the 
woruamnlity but not the distribution of DOS current-speed 


data. 
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VI. CONCLUSIONS 


The logarithmic-speed transformation of both current- 
meter and altered DOS speed records produces distributions 
that as a group can be considered symmetrically distributed 
about their means. The distribution of the mean values ap- 
pear log-normal. However, the mean of both types of data 
exhibit a slight "S" shape systematic deviation which is 
probably real. This systematic deviation appears likely to 
be the result of external influences on the data which are 
both natural and man-made. Indications are that elimination 
of these influences would allow the mean of the logarithmic- 
speed distributions of current data to be log-normal with 
foie evel or Comtidence. 

UG Sedacawcolpares Qulcemravorably sro current -ineter ata 
and could be used to derive probability estimates for sur- 
face current speeds in areas where no other data is available. 

The limits of usefulness of current-meter data appears 
to extend from NDM values of at least -3 sigma to somewhere 
between two and three sigma. For DOS data these limits are 
from at least -2 sigma to somewhere between two and three 
sigila. Extrapolation beyond these limits 1s extremely un- 
certain and, with present knowledge, should only be con- 
sidered if the consequences of a many-fold error in probability 
aEewireclvwaccepted. 

Indications are that seasonal and area influences on the 


current speeds exist but that these influences are limited 
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Figure 6. Deviation of Logarithmic-Speed Distribution from 
the Log-Normal Distribution. Mean and Range of 
£3 to 29 Moored Current-Meter Time-Series Data 
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